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1.  Introduction 

In  order  to  obtain  a  good  decision  rule  for  some  statistical  problem  we  start  by  making 
assumptions  concerning  the  class  of  distributions,  the  loss  function,  and  other  data  of 
the  problem.  Usually  these  assumptions  only  approximate  the  actual  conditions,  either 
because  the  latter  are  unknown,  or  in  order  to  simplify  the  mathematical  treatment  of 
the  problem.  Hence  the  assumptions  under  which  a  decision  rule  is  derived  are  ordinarily 
not  satisfied  in  a  practical  situation  to  which  the  rule  is  applied.  It  is  therefore  of  interest 
to  investigate  how  the  performance  of  a  decision  rule  is  affected  when  the  assumptions 
under  which  it  was  derived  are  replaced  by  another  set  of  assumptions. 

We  shall  confine  ourselves  to  the  consideration  of  assumptions  concerning  the  class  of 
distributions.  Investigations  of  particular  problems  of  this  type  are  numerous  in  the 
literature.  There  are  many  studies  of  the  performance  of  “standard”  tests  under  “non¬ 
standard”  conditions,  for  example  [3],  where  further  references  are  given.  Most  of  them 
considered  only  the  effect  of  deviations  from  the  assumptions  on  the  significance  level  of 
the  test.  The  relatively  few  studies  of  the  effect  on  the  power  function  include  several 
papers  by  David  and  Johnson,  the  latest  of  which  is  [6],  For  some  problems  tests  have 
been  proposed  whose  significance  level  is  little  affected  by  certain  deviations  from  stand¬ 
ard  assumptions,  for  instance  R.  A.  Fisher’s  randomization  tests  (see  section  3;  see  also 
Box  and  Andersen  [4]).  Some  other  relevant  work  will  be  mentioned  later. 

In  sections  2, 3,  and  4  we  shall  be  concerned  with  problems  of  the  following  type.  Let  P 
denote  the  joint  distribution  of  the  random  variables  under  observation.  Suppose  that 
we  contemplate  making  the  assumption  that  P  belongs  to  a  class  fix,  but  we  admit  the 
possibility  that  actually  P  is  contained  in  another  class,  /?2.  The  performance  of  a  de¬ 
cision  rule  (decision  function)  d  is  assumed  to  be  expressed  by  the  given  risk  function 
r  (P,  d),  defined  for  all  P  6  fix  +  fii  and  all  d  in  D,  the  class  of  decision  rules  available 
to  the  statistician.  Let  d,  be  a  decision  rule  which  is  optimal  in  some  specified  sense  (for 
example,  minimax)  under  the  assumption  P  6  P%,  i  =  1,  2.  Suppose  first  that  the  op¬ 
timal  rule  di  is  unique  except  for  equivalence  in  fix  +  fit,  for  i  =  1,  2,  that  is,  if  d\  is 
also  optimal  for  P  6  Pi  then  r(P,  di)  =  r(P,  d,)  for  all  P  6  Pi  +  Pi-  Then  we  may 
assess  the  consequences  of  the  assumption  P  £  P i  when  actually  P  €  Pi  by  compar¬ 
ing  the  values  r(P,  di)  and  r(P,  d2)  for  P  6  P 2.  If  the  optimal  rules  are  not  unique,  we 
may  pick  out  from  the  class  of  rules  which  are  optimal  for  P  €  Pi  a  subclass  of  rules 
which  come  closest  to  optimality  under  the  assumption  P  6  P2,  and  compare  their 
performance  with  that  of  the  rules  which  are  optimal  under  the  latter  assumption.  In 
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some  situations  other  ways  of  approaching  the  problem  may  be  more  adequate  (see,  for 
example,  section  2). 

In  section  2  the  consequences  of  assuming  that  a  distribution  is  continuous  are  dis¬ 
cussed.  Problems  involved  in  comparing  assumptions  of  varying  generality  are  con¬ 
sidered  in  section  3.  Section  4  is  concerned  with  cases  where  decision  rules  derived  under 
assumptions  of  normality  retain  their  optimal  properties  when  these  assumptions  are 
relaxed. 

The  last  three  sections  deal  with  distinguishable  sets  of  distributions,  a  concept  re¬ 
lated  to  the  problem  of  the  existence  of  unbiased  or  consistent  tests  under  given  assump¬ 
tions.  Criteria  for  the  distinguishability  of  two  sets  by  means  of  a  test  based  on  finitely 
many  observations  and  by  a  sequential  test  are  considered  and  their  uses  illustrated  in 
sections  5  and  7.  An  example  where  two  sets  are  indistinguishable  by  a  nonrandomized 
test,  but  distinguishable  by  a  randomized  test,  is  discussed  in  section  6. 

2.  The  assumption  of  a  continuous  distribution 

The  assumption  that  we  are  dealing  with  a  class  of  continuous  distributions  is  usually 
made  when  actually  the  observations  are  integer  multiples  of  the  unit  of  measurement  h, 
a  (small)  positive  constant.  Suppose  that  a  sample  x  =  (xi,  •••,£„)  is  a  point  in  Rn,  and 
let  be  a  class  of  distributions  (probability  measures)  which  are  absolutely  continuous 
with  respect  to  w-dimensional  Lebesgue  measure.  Let  S  be  the  set  of  all  points  in  Rn 
whose  coordinates  are  integer  multiples  of  h.  Let  us  suppose  that  when  we  say  that  the 
distribution  is  Pi  £  fix,  we  “have  in  mind”  that  the  distribution  is  P2  =  /(Pi),  where 
the  probability  measure  P2  is  defined  by 

(l)  P2  ( { y } )  =Pi(j*:  yi-\<  i=i, •••,»!) 

for  all  y  —  (y1}-  •  •,  yn)  in  S.  Let  Pi  =  {/(P):  P  £  Pi}-  Thus  we  are  interested  in  the 
consequences  of  assuming  P  £  Pi  when  actually  P  6  Pi. 

Let  d  be  a  decision  function  which  is  optimal  in  some  sense  under  the  assumption 
P  €  Pi-  Then  any  decision  rule  which  differs  from  d  only  on  the  set  S  is  equivalent  to  d 
for  P  £  Pi-  Since  P(S)  =  1  for  all  P  6  Pi,  the  mere  fact  that  a  rule  is  optimal  for 
P  £  Pi  does  not  tell  us  anything  about  its  performance  when  P  6  Pi,  indeed,  it  can  be 
as  bad  as  we  please  under  the  latter  assumption.1  Of  course,  in  general  there  are  rules 
which  are  optimal  under  either  assumption.  But  the  main  reason  for  making  the  simplify¬ 
ing  assumption  of  continuity  is  that  we  do  not  want  to  bother  with  rules  which  are  op¬ 
timal  for  P  €  P 2-  Now  it  is  clear  that  if  there  is  a  determination  d!  of  d  which  is  suffi¬ 
ciently  regular,  its  risk  at  Pi  =  /(Pi)  will  differ  arbitrarily  little  from  the  risk  at  Pi  if 
h  is  small  enough;  also,  d'  may  not  be  much  worse  than  an  optimal  rule  for  P  £  Pi-  We 
shall  not  investigate  here  under  what  conditions  such  a  regular  decision  rule  exists  or 
how  small  h  has  to  be  in  order  that  the  assumption  of  continuity  cause  little  harm. 
These  questions  may  deserve  attention.  Fortunately,  when  a  statistician  applies  a  de¬ 
cision  rule,  he  is  likely  to  choose  the  most  regular  determination  available  anyway.  How¬ 
ever,  the  theoretical  statistician  might  do  well  to  be  careful  when  he  neglects  sets  of 
measure  zero. 

1  The  author’s  attention  was  drawn  to  situations  of  this  kind  by  H.  Robbins  some  years  ago. 
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3.  Assumptions  of  varying  generality 

Suppose  we  consider  making  one  of  two  assumptions,  P  £  Pi  and  P  £  p2,  where 
Pi  c  P i-  The  second  assumption  is  safer,  but  with  the  first  assumption  we  may  achieve 
a  smaller  risk. 

The  consequences  of  making  the  broader  assumption  when  actually  the  narrower  as¬ 
sumption  is  justified  may  be  called  serious  if  any  decision  rule  which  is  “good”  under  the 
broader  assumption  is  “bad”  under  the  narrower  assumption.  Thus  the  consequences 
will  depend  on  what  we  mean  by  a  good  decision  rule.  But  even  with  a  given  definition 
of  “good”  or  “best”  the  consequences  may  depend  on  the  class  of  decision  rules  at  our 
disposal.  For  example,  suppose  we  require  a  minimax  estimator  of  the  mean  n  of  a  nor¬ 
mal  distribution  when  the  loss  function  is  the  squared  deviation  from  n,  and  we  assume 
that  the  variance  a2  does  not  exceed  a  given  number  A.  If  we  are  restricted  to  estimators 
based  on  a  sample  of  fixed  size,  the  minimax  estimator  is  the  sample  mean  x  and  does  not 
depend  on  A.  On  the  other  hand,  if  we  are  permitted  to  choose  the  sample  size  in  ad¬ 
vance,  and  the  cost  of  sampling  is  taken  into  account,  the  minimax  estimator  will  de¬ 
pend  on  A.  If  A 2  is  substantially  larger  than  Ah  the  assumption  a2  S  A2  will  give  us  a 
unique  minimax  estimator  whose  performance  is  poor  under  the  assumption  a2  ^  Ai. 

Sometimes  a  considerable  broadening  of  the  assumption  does  not  lead  to  serious  con¬ 
sequences  when  the  narrower  assumption  is  justified.  Thus  in  the  standard  problems  con¬ 
cerning  the  variance  of  a  normal  distribution  we  need,  when  the  mean  is  completely  un¬ 
known,  just  one  more  observation  to  obtain  the  same  expected  loss  as  when  the  mean 
is  known.  Somewhat  similar  results  have  been  obtained  in  certain  cases  where  a  para¬ 
metric  class  of  distributions  is  enlarged  to  a  nonparametric  class.  Several  examples  can 
be  found  in  [9].  For  instance,  consider  the  problem  of  testing  whether  two  distributions 
are  equal  (and  not  otherwise  specified)  against  the  alternative  that  the  distributions  are 
normal  with  common  variance  and  means  m  <  n2.  The  uniformly  most  powerful  similar 
test,  based  on  two  random  samples  of  fixed  size,  is  asymptotically  as  powerful  in  large 
samples  (in  a  sense  explained  in  [9])  as  the  corresponding  standard  test  for  testing  the 
equality  of  the  means  of  two  normal  distributions.  (The  former  test  is  of  the  randomiza¬ 
tion  type  introduced  by  R.  A.  Fisher;  its  optimal  properties  were  proved  by  Lehmann 
and  Stein  [12].)  Here  we  assumed  that  the  class  of  alternatives  is  the  same  under  both 
assumptions.  Actually  the  test  retains  its  property  of  being  uniformly  most  powerful 
similar  even  when  the  class  of  alternatives  is  enlarged  to  a  nonparametric  class  of  dis¬ 
tributions  of  an  exponential  type  (see  Lehmann  and  Stein  [12]).  If  the  class  is  further 
extended,  a  uniformly  most  powerful  similar  test  will  in  general  not  exist,  and  it  will  be 
necessary  to  specify  against  what  types  of  alternatives  the  power  of  a  test  should  be 
large.  This  can  be  done  in  many  ways,  and  an  optimal  test  and  its  performance  in  the 
class  of  normal  distributions  will  depend  on  this  specification. 

4.  Nonparametric  justifications  of  assumptions  of  normality 

Given  a  decision  rule  d  which  is  optimal  in  a  specified  sense  under  the  assumption 
that  P  is  in  a  class  Pi,  it  is  of  interest  to  determine  other  classes  P  such  that  d  is  optimal 
(in  the  same  or  a  suitably  extended  sense)  under  the  assumption  P  £  p.  If  optimal 
means  minimax,  an  obvious  sufficient  condition  for  d  to  remain  a  minimax  rule  in  P  d  px 
is  that  the  risk  of  d  in  P  attain  its  maximum  in  P\.  Situations  of  this  type  were  considered 
by  Hodges  and  Lehmann  [8]. 
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In  certain  cases  we  find  that  a  decision  rule  derived  under  the  assumption  of  a  normal 
distribution  retains  its  optimal  character  in  a  large,  nonparametric  class  of  distributions. 
One  result  of  this  type,  concerning  the  minimax  character  of  Markov  estimators,  can  be 
found  in  [8].  Similar  though  weaker  results  can  be  obtained  in  certain  testing  problems. 

As  an  example  consider  the  following  extension  of  Student’s  problem.  Let  be  the 
class  of  distributions  F  with  finite  mean  n(F),  positive  variance  <P{F)  and  such  that 

(2)  fm\x-n(F)  \'dF(x)  SMv*  (F) , 

where  M  is  fixed.  Let  42«  be  the  subclass  of  4?  with  y/n  n(F )/ <r(F)  =  8.  We  want  to 
test  the  hypothesis  F  G  4?«>  6^0,  against  the  alternative  F  6  4?«>  5  >  0.  We  restrict 
ourselves  to  the  class  D  of  tests  d  based  on  n  independent  observations  from  F,  with 
critical  region  W  —  W(d).  We  choose  the  risk  function 

( aP{W\F )  if  F€^«,5^0, 

(3)  r(F,d)  =\b[l-P(W\F)\  if  Fe^s,8^81} 

lo  elsewhere , 

where  P{W\F)  denotes  the  probability  of  (Xlf-  •  *,  Xn)  6  W  when  the  X,-  are  inde¬ 
pendent  with  the  common  distribution  F,  and  a,  b,  and  8i  are  positive  constants. 

Let  do  be  the  test  with  critical  region  Wo  =  [t  >  c},  where 

(4)  f  =  ,  s c  =  52=  (n—  1)  (xj-x)2, 

s  j-i  j-i 

and  the  constant  c  is  determined  by 

(5)  a  [1  —  Sn-i(c,  0)  1  =  bSn-itc^i) ; 

here  8)  denotes  the  noncentral  Student  distribution  function  with  n  —  1  degrees 

of  freedom  and  noncentrality  parameter  8.  It  can  be  shown  by  standard  methods  that  do 
is  the  minimax  test  in  the  subclass  42°  of  42  which  consists  of  the  normal  distributions. 
By  an  inequality  of  Berry  and  Esseen  (see,  for  example,  [7])  the  distribution  function 
Fn(y )  of  n1/2[X  —  n(F)\/a(F)  converges  to  the  standard  normal  distribution  func¬ 
tion  4>(y)  uniformly  for  F  (E  4^  (and  uniformly  in  y)  as  n  — >  a> .  Also,  for  any  e  >  0, 
P[\s/<r(F)  —  1 1  <  e\F]  — >  1  uniformly  for  F  6  4*-  Hence  it  can  be  shown  that  for  any 
real  8  and  for  all  F  €  4*«  we  have 

(6)  \PV£y\F)  |  gC„(5),  -co<y<  oo  , 

where  C„(5)  depends  on  n,  8 ,  and  M  only  and  tends  to  0  as  n  — >  <» ,  for  5  fixed.  It  fol¬ 
lows  that 

(7)  \P(t^y\F)  —  Sn-i  (y  1 8)  |^2  Cn(5),  -oo<y<oo 

for  all  F  6  4^«- 

Now  if  4?*  denotes  the  subclass  of  4?  with  n(F)  =  0  and  <r2(F)  =  1,  we  have 

4-/5  "I 

- - - >  c|F  , 

s  J 
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which  is  a  nondecreasing  function  of  8.  The  same  is  true  of  the  infimum  in  4?«-  Hence 
we  obtain 

(9)  sup  r  (F,  d0)  ^  sup0  r  (F,  d0)  +  e , 

F<?F  f€F 

^  inf  sup  r  (F,  d) 
d$D  f£f 

where  e  =  2  max  {aCn( 0),  bCn(8i) } .  Thus  the  maximum  risk  in  4?  of  Student’s  test  do 
exceeds  the  minimax  risk  in  4?  by  at  most  e,  where  e  is  arbitrarily  small  for  n  sufficiently 
large.  (Note  that  the  minimax  risk  is  bounded  away  from  zero  as  n  — »  00 .) 

In  the  corresponding  problem  with  <r(F)  =  <r  fixed  we  find  in  a  similar  way  the 
stronger  result  that  the  maximum  risk  of  the  attest  in  4?«  [the  class  with  11(F)  =  8o/y/n\ 
lies  within  a  small  e  of  its  “normal”  risk,  uniformly  in  8.  The  argument  which  was  used 
above  does  not  permit  us  to  decide  whether  an  analogous  result  is  true  when  <r(F)  is 
unrestricted. 

The  explanation  for  the  near-optimal  behavior  of  the  “normal”  decision  rules  in  these 
cases  is,  of  course,  the  distribution-free  character  of  the  central  limit  theorem,  combined 
with  the  fact  that  the  class  4?  was  so  chosen  as  to  make  the  approach  to  the  normal  dis¬ 
tribution  uniform. 

5.  Distinguishable  sets  of  distributions 

If  we  relax  the  assumptions  more  and  more,  the  minimax  risk  will  in  general  increase, 
and  eventually  we  may  reach  a  point  where  the  maximum  risk  of  any  decision  rule  is  not 
smaller  than  the  risk  of  a  rule  which  does  not  depend  on  the  observations.  We  shall  con¬ 
sider  criteria  for  recognizing  when  this  or  a  similar  situation  occurs  in  testing  problems. 

Consider  a  testing  (or  two  decision)  problem  such  that  one  or  the  other  decision  is 
definitely  preferred  according  as  the  distribution  P  belongs  to  /01  or  fl2,  two  disjoint  sub¬ 
sets  of  the  given  class  /?.  Unless  otherwise  stated  we  assume  that  each  P  in  P  is  a  proba¬ 
bility  measure  on  (X,  /J),  where  X  is  the  space  of  infinite  sequences  x  =  (xj,  x2f  •  •  • )  of 
real  numbers  and  A  is  Kolmogorov’s  extension  to  X  of  the  ordinary  Borel  field. 

A  test  will  be  called  finite  if  it  depends  only  on  a  finite  number  of  coordinates  (observa¬ 
tions)  Xj.  By  the  critical  function  of  a  finite  test  we  mean  a  measurable  function  yp  from  X 
to  the  interval  [0,  1]  such  that  1  —  tp(x)  [^(x)]  is  the  probability  of  taking  the  decision 
corresponding  to  P  £  P1  [P  €  P2\  when  x  is  the  sequence  of  observations. 

Let  D  be  any  class  of  finite  tests,  and  let  'k  be  the  class  of  the  critical  functions  of 
tests  in  D.  We  shall  say  that  the  sets  fl1  and  P2  are  distinguishable  in  D  if  there  exists  a  \p 
in  'k  such  that 

(10)  sup,£»|P)  +  sup.EU— iMP)  <1  . 

PtP  p£P 

where  E(f\  P)  is  the  expected  value  of  f(X)  when  X  has  the  distribution  P.  Otherwise 
Px  and  P2  are  said  to  be  indistinguishable  in  D.  [The  property  of  \p  expressed  in  (10)  has 
an  obvious  relation  with  unbiasedness.] 

Let  D/  denote  the  class  of  all  finite  tests,  and  let  Dn,  n  =  1,  2,-  •  -,  be  the  class  of  all 
fixed  sample  size  tests  based  on  the  observations  (»i,-  *  *,  x„).  Two  sets  which  are  dis¬ 
tinguishable  in  D/  will  be  called  finitely  distinguishable.  We  observe  that  two  sets  are 
finitely  distinguishable  if  and  only  if  they  are  distinguishable  in  Dn  for  some  n. 

Berger  and  Wald  [2]  gave  conditions  under  which  two  sets  of  distributions  are  dis- 
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tinguishable  in  the  class  of  all  nonrandomized  tests  in  Dn  if  and  only  if  they  are  disjoint. 
(Their  theorem  3.1  is  stated  in  a  slightly  more  special  form.) 

A  sufficient  condition  for  two  sets  to  be  indistinguishable  in  Dn  can  be  stated  as  fol¬ 
lows.  Let  Xn  be  the  space  of  points  (xi,  •  •  • ,  xn),  and  let  Any  Pn  and  P^  be  the  a-field  of 
subsets  of  Xn  and  the  classes  of  distributions  on  An  which  are  determined  by  A,  P  and 
P{  in  an  obvious  way.  For  any  two  distributions  Pi  and  P2  on  An  we  denote  by  v  any 
measure  on  An  relative  to  which  Pi  and  P2  are  absolutely  continuous,  and  by  P\  and 
p2  the  respective  densities  (Radon-Nikodym  derivatives).  With  this  notation,  the  sets 
Pl  and  P2  (or,  equivalently,  the  sets  Pi  and  Pi)  are  indistinguishable  in  Dn  if  for  any 
6  >  0  there  exist  two  distributions,  Pi  6  pi  and  P2  6  Pi,  such  that 

(11)  S\pl~  P*\  dv<€  , 

where  the  integral  extends  over  Xn-  This  follows  from  the  inequality 

(12)  in itf\pdP—  supt  f\pdP  ^  f\p  (p2  —  pi)  dv  . 

p£P  p€P 

The  statement  of  the  condition  remains  true  in  the  more  general  form  where  P,  is 
any  mixture  of  distributions  in  pi,  with  respect  to  some  probability  measure  £,  on  a 
cr-field  of  subsets  of  Pi,  subject  to  an  obvious  measurability  condition.  The  proof  is 
similar  and  uses  theorem  3  of  Robbins  [13].  A  theorem  of  Le  Cam  (see  Kraft  [15])  im¬ 
plies  that  if  the  distributions  in  Pi  and  Pi  are  absolutely  continuous  with  respect  to  a 
fixed  measure,  the  condition  expressed  in  (11),  with  Pi  and  P2  mixtures,  is  also  necessary 
for  the  indistinguishability  of  Pi  and  Pi. 

With  5  =  {*:  pi{x)  >  pz(x)}  we  have 

(13)  =  sup  |P,U)  -PA  A)  |  =  P,(S)  -PAS). 

A 

The  first  equation  (13)  shows  that  condition  (11)  is  independent  of  the  choice  of  v.  The 
last  expression  in  (13)  is  often  convenient  when  applying  this  condition. 

It  follows  from  an  earlier  remark  that  two  sets  Px  and  P2  are  finitely  indistinguishable 
if  the  condition  expressed  in  (11)  is  satisfied  for  every  n. 

We  shall  say  that  P1  and  P2  are  finitely  absolutely  distinguishable  if  for  any  e  >  0 
there  exists  a  finite  test  with  critical  function  \p  such  that 

(14)  sup^WlP)  +  sup,P(l-^|P)  <£. 

p€pn  pZPn 

This  property  has  also  been  expressed  by  saying  that  there  exists  a  uniformly  consistent 
sequence  of  tests  [1]. 

Now  suppose  that  each  P  in  P  is  the  distribution  of  a  sequence  of  independent,  identi¬ 
cally  distributed  random  variables.  Then  if  two  sets  are  finitely  distinguishable,  they  are 
finitely  absolutely  distinguishable.  This  is  a  simple  partial  extension  of  a  theorem  of 
Berger  [1] ;  the  theorem  gives  a  necessary  and  sufficient  condition  for  the  existence  of  a 
uniformly  consistent  sequence  of  nonrandomized  tests.  Further  interesting  results  on  the 
existence  of  a  uniformly  consistent  sequence  of  tests  were  recently  obtained  by  Kraft  [15]. 
We  now  give  three  examples  of  finitely  indistinguishable  sets. 

Example  5.1.  If  P  is  the  distribution  of  independent,  normal  random  variables  with 
mean  n  and  variance  a2,  and  P'  is  the  set  with  n  =  m,  0  <  a2  <  co,  then  P1  and  P2  are 
finitely  indistinguishable.  Conditional  1)  is  satisfied  for  every  n  if  P»  is  the  distribution 
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with  n  =  m  and  a  sufficiently  large.  The  corresponding  result  for  tests  with  oonstant 
power  in  ft1  and  ft1  was  proved  by  Dantzig  [5]  in  1940. 

Example  5.2.  If  P  is  the  distribution  of  independent,  normal  random  variables  with 
means  m,  fx 2)  -  ■  •  and  common  variance  o-2,  and  the  set  with  a  =  aiy  —  °°  <  fx}  <  <» , 
j  =  1,  2,  *  *  - ,  then  and  /02  are  finitely  indistinguishable.  Here  we  can  apply  the  gen¬ 
eral  form  of  condition  (11).  For  if  P,  is  the  mixture  of  the  P  in  Pi  according  to  £„  where 
under  &  the  means  m,  •  •  • ,  nn  are  independent  normal  with  zero  mean  and  variance  rf, 
such  that  a\  +  rf  =  a\  -f-  r2,  then  Pi  =  Pi. 

Example  5.3.  This  is  a  further  extension  of  Student’s  problem  (see  section  4).  Let 
be  the  class  of  all  distributions  F  on  the  real  line  with  finite  mean  /u(P)  and  positive  vari¬ 
ance  <P(F)  such  that  ix(F)/<r(F)  =  y»,  71  <  72.  Let  Pi  be  the  class  of  distributions  of 
independent  random  variables  with  common  distribution  F  6  Then  Pl  and  P2  are 
finitely  absolutely  distinguishable  if  71  <  0  <  72,  and  finitely  indistinguishable  if 

72  5S  0  or  71  ^  0. 

If  71  <  0  <  72,  it  is  easy  to  show  with  the  aid  of  Chebyshev’s  inequality  that  the 

n 

tests  with  critical  functions  pn{x)  =  0  or  1  according  as  ^  xj  ^  0  or  >  0  form  a  uni- 

1 

formly  consistent  sequence. 

If  71  ^  0,  condition  (11)  is  satisfied  for  every  n  if  P,  is  the  distribution  with  F  —  P,, 
where  P,  ascribes  probabilities  1  —  n  and  7 r»  =  (1  +  tf)*1  to  the  respective  points 
72  —  t2x  and  72  +  h\  here  t2  >  0 ,  h  =  f(t2)  is  the  positive  root  (unique  for  t2  small)  of 

(15)  ( 1  ~  72^2)  t\  4*  7i  ( 1  ^2)  h  ~  72^2  —  t2  =  0  , 

and  t2  — >  0.  The  case  72  S  0  can  be  reduced  to  this  case. 

6.  Sets  distinguishable  only  by  randomized  tests:  An  example 

Some  results  of  Lehmann  [11]  suggest  that  two  sets  may  be  distinguishable  in  Dn  but 
indistinguishable  in  the  class  D'n  of  nonrandomized  tests  in  Dn.  We  shall  consider  a  prob¬ 
lem  where  this  situation  occurs.  We  denote  by  \knOF')  the  class  of  critical  functions  of  the 
tests  in  Z>„  (!>').  Thus  if  p  (E  'ff',  p(x)  =  0  or  1  for  all  x. 

Let  ^  be  a  class  of  distributions  F  on  the  real  line  with  mean  j u  and  variance  1,  which 
contains  all  distributions  with  this  property  which  assign  probability  1  to  at  most  three 
points.  Let  /0M>  „  be  the  class  of  all  distributions  of  n  independent  random  variables 
with  a  common  distribution  in  We  shall  show  that  P\  „  and  P^  „  are  distinguish¬ 
able  in  Dn  for  all  X  1A  n  and  all  n  —  1,  2,  •  •  • ,  but  indistinguishable  in  D'n  for  any  n  un¬ 
less  |X  —  n\  exceeds  a  positive  constant  (which  depends  on  n).  It  is  clearly  sufficient 
to  take  X  =  —h,ix  =  h>  0.  We  denote  by  E(J\  F)  the  expected  value  of  f(X)  when 
the  components  of  X  are  independent  with  the  common  distribution  P. 

We  first  prove  the  second  part  of  the  statement  in  the  stronger  form:  For  any  n  and 
for  any  a  £  (0,  1)  the  inequalities 

(16)  sup  E(P\F)^a^  ini  E(P\F) 

F  6  F-h  r£Fh 

cannot  both  be  satisfied  with  p  6  unless  h  exceeds  a  positive  number  which  depends 
only  on  n  (and  is  of  order  n~1/2).  If  p  is  in  Sk'  and  satisfies  the  first  inequality  (16),  we 
must  have 

(17) 


P(y,--,y)=  0  if  a[l  +  (y  +  A)2]n<  1  , 
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for  all  real  y.  For  if  t  =  y  -f-  h  9*  0,  let  Ff  be  the  distribution  (in  which  assigns  the 
probabilities  (1  +  f)~l  and  1  —  (1  +  P)~l  to  the  respective  points  t  —  h  and  —trl—  h. 
Then  a  ^  E(\p  |  F')  ^  rp(t  —  h,-  •  •,  t  —  A)(l  +  *2)_n-  This  implies  (17)  for  y  -f-  h  ^  0. 
If  y  +  h  =  0,  we  use  a  similar  argument  with  F'  any  distribution  in  4?-a  which  assigns 
to  the  point  —  h  a  probability  arbitrarily  close  to  1. 

Similarly,  for  any  which  satisfies  the  second  inequality  (16)  we  must  have 

(18)  *(y, •••,?)=  1  if  ( 1  —  a)  [  1  -j-  (y  —  A) 21 n <  1  , 

for  all  real  y.  Taking  y  =  —h  and  y  =  h,  we  find  that  a  ^  £  '  cannot  satisfy  both  in¬ 

equalities  (16)  if  [1  +  (2 hy\n  <  max  [a-1,  (1  —  a)-1];  and  hence  cannot  satisfy  them 
for  any  a  if  [1  +  (2/t)2]n  <  2.  [This  is  not  the  best  bound  which  can  be  obtained  from  (17) 
and  (18).] 

We  now  show  that  for  any  h  >  0,  any  n  ^  1,  and  any  a  £  (0, 1)  condition  (16),  with 
at  least  one  strict  inequality,  can  be  satisfied  by  a  randomized  test  in  Dn.  Let  a  =  hn1/2, 
—a  <  c  <  a, 

k  ( -  c)  j  k(c) 

— — ■  d~—< 

y^b, 

b<  y  <  d  , 
d^y  . 


^  inf  E(4/\F) 
rtFk 

for  \c\  <  a.  As  c  increases  from  —a  to  a,  either  side  of  (21)  decreases  continuously 
from  1  to  0. 

We  sketch  the  proof  of  (21).  Let  /(y)  be  any  polynomial  of  the  second  degree  such 
that  <t>(y)  ^  f(y )  for  all  real  y.  If  g(x)  =  /(n~1/2 ^ xj),  then  E{yp\F)  ^  E{g\F),  and 
E(g  |  F)  is  constant  in  4^/*  for  each  n.  Now  choose /  so  as  to  minimize  E(g  \  F),F  £  4^*- 

7.  Sequentially  distinguishable  sets  of  distributions 

We  shall  restrict  ourselves  to  sequences  of  independent  random  variables  with  a  com¬ 
mon  distribution  F.  Suppose  that  F  £  4?>  and  let  4*1  and  4?2  be  two  disjoint  subsets 
of  4?.  Let  D,  —  Dt(4p)  be  the  class  of  all  sequential  tests  for  taking  one  of  two  decisions, 
ai  and  <z2,  which  terminate  with  probability  one  for  all  F  £  4?-  We  denote  by  Pr  { a,-  \F,d) 
and  E(n  \F,d ),  respectively,  the  probability  of  the  decision  and  the  expected  number  of 
observations  required  to  reach  a  decision  when  the  distribution  is  F  and  test  d  is  used. 

The  sets  4?1  and  4^2  will  be  called  sequentially  distinguishable  (indistinguishable)  at 
F  if  there  exists  (does  not  exist)  a  dwD,  such  that  E(n \F,d)  <  »  and 


(19)  k  (c)  =  a+  c+  (a—  c)  _1,  b  = 

0  if 

(20) 


d-b 

u 


If  we  let  \f/(x)  —  <t>  (r*±»)  ,  we  have 
(2!)  E(f\F)  ak(c)  K  1 


2  ak 


(22) 


su^,  Pr  { a2 1 F,  d }  +  sujd,  Pr  { ax  j  F,  d }  <  1  . 
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If  the  left  side  of  (22)  is  arbitrarily  small  for  some  d  in  D»  with  E(n  |  F,  d)  <  °° ,  then 
4Z1  and  4Z2  are  said  to  be  sequentially  absolutely  distinguishable  at  F.  If  4Z1  and  4^2  are 
sequentially  [absolutely]  distinguishable  (indistinguishable)  at  every  F  in  a  class  4Z*> 
then  4Z1  and  4I2  will  be  said  to  be  sequentially  [absolutely]  distinguishable  (indistinguish¬ 
able)  in  4?** 

Note  that  these  definitions  are  stated  in  terms  of  the  sets  4?1  and  4?2  rather  than  in 
terms  of  the  corresponding  sets  of  distributions  of  sequences.  Statements  such  as  4?1  and 
4Z2  are  finitely  indistinguishable  will  have  an  obvious  meaning  in  this  context. 

A  sufficient  condition  for  two  sets  to  be  sequentially  indistinguishable  is  implied  by 
an  inequality  proved  in  [10].  Let  F\  £  4^S  F2  €  4?2>  ^  €  4^>  and  let  v  be  a  measure  rela¬ 
tive  to  which  these  three  distributions  are  absolutely  continuous,  with  respective  densi¬ 
ties  fufa,  and/.  By  a  trivial  extension  of  equation  (4)  in  [10],  if  d  is  any  test  in  D,  such 
that 

(23)  sup,  Pr  {a2\F,  d)  ^  ,  sup.  Pr  {ai\F,  d)  ^  a2 

f£f  fCf 

where  ai  >  0,  02  >  0,  ai  +  <*2  <  then 

7?  /  „  1  e>  ^  ^  —  log  [  ai  ( 1  —  02)  1_c  +  (l-ai)carc] 

(24)  E{n\F,d)£ - - - - - 

cff  log  -j-  dv+  (1  -  c)  ff  log  -j-  dv 
J 1  h 


for  0  <  c  <  1,  where  the  integrals  are  taken  over  the  entire  space.  If,  in  particular, 
F  6  4^>  the  right  side  of  (24)  is  maximized  with  F\  =  F  and  c  — » 0,  and  we  obtain 


(25)  E{n\Ftd)^- 


ai  log 


ax 


a  2 


4-  (1  —  ax)  log 


1 


ax 


a  2 


ff  log  jr  dv 


if  F6#1. 


We  note  that  the  numerators  and  denominators  in  (24)  and  (25)  are  positive;  the  de¬ 
nominators  may  be  infinite. 

Hence  if  for  any  positive  number  M  and  any  two  positive  numbers  a\  and  a 2  with 
ai  -f-  a2  <  1  the  distributions  Fx  £  4*1  and  F2  £  4*2  and  the  number  c  can  be  so  chosen 
that  the  right  side  of  (24)  exceeds  M,  the  sets  4I1  and  4Z2  are  sequentially  indistinguish¬ 
able  at  F.  If  F  £  421,  the  two  sets  are  sequentially  indistinguishable  at  F  if  for  any 
€  >  0  we  can  find  an  F2  £  4*2  such  that 

(26)  ff  log  y-  dv  <  e . 


By  example  5.1  two  sets  of  normal  distributions  with  fixed  means  and  unrestricted 
variances  are  finitely  indistinguishable.  On  the  other  hand,  by  a  well-known  result  of 
Stein  [14],  these  sets  are  sequentially  absolutely  distinguishable  in  the  class  of  all  normal 
distributions.  However,  if  the  requirement  E(n  \  F,  d)  <  00  is  replaced  by  the  stronger 
condition  that  E(n\F,  d)  =  E(n\n,  a;  d)  be  bounded  in  <r  for  n  fixed,  inequality  (24) 
easily  implies  that  condition  (22)  cannot  be  satisfied. 

As  an  application  of  condition  (26)  we  shall  show  that  the  sets  4Z1  and  4^2  of  example 
5.3,  with  71  =  0  <  72,  are  sequentially  indistinguishable  in  4Z2-  Let  F  be  any  distribu¬ 
tion  in  4Z2,  so  that  11(F)/ <r(F)  =  72.  Let  Fi  =  (1  —  t)F  tG,  where  0  <  t  <  1  and  G  is 
the  distribution  which  assigns  probability  one  to  the  point  a  =  —  n(F)(l  —  t)/t.  Then 
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Fi  €  Both  Fi  and  F  are  absolutely  continuous  relative  to  v  =  Fh  with  respective 
densities /1  (it:)  =  1  and 


(27) 


/(*)  =< 


1  - 1 


ib  +  t-  bt 


x *  , 
x  =  a , 


where  b  =  b(t)  is  the  ^-probability  of  the  point  a.  Hence 
(28)  J71og-^  *•  =  -  (1  —  i)  log  (1  —  0  +  6  log  b  +  bt_bl. 


where  the  last  term  is  to  be  omitted  if  b  =  0.  The  right  side  of  (28)  tends  to  0  as  t  — >  0. 
Thus  condition  (26)  (with  ^Z1  and  ty1  interchanged)  can  be  satisfied  for  any  e  >  0. 

The  proof  shows  that  this  result  still  holds  if  4Z1  and  consist  only  of  the  mixtures 
H  =  (1  —  t)F  tG  of  a  normal  distribution  F  and  an  arbitrary  distribution  G,  where 
0  ^  t  <  e  and  « is  positive  and  as  small  as  we  please.  The  distributions  H  are,  in  a  sense, 
very  close  to  normal  distributions. 
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